Multi-class Text Categorization with Error Correcting Codes

نویسنده

  • Gerhard Paass
چکیده

Automatic text categorization has become a vital topic in many applications. Imagine for example the automatic classi cation of Internet pages for a search engine database. The traditional 1-of-n output coding for classi cation scheme needs resources increasing linearly with the number of classes. A di erent solution uses an error correcting code, increasing in length with O(log2(n)) only. In this paper we investigate the potential of error correcting codes for text categorization with many categories. The main result is that multi-class codes have advantages for classes which comprise only a small fraction of the data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-class Classification with Error Correcting Codes

Automatic text categorization has become a vital topic in many applications. Imagine for example the automatic classification of Internet pages for a search engine database. The traditional 1-of-n output coding for classification scheme needs resources increasing linearly with the number of classes. A different solution uses an error correcting code, increasing in length with O(log2(n)) only. I...

متن کامل

Error-Correcting Output Codes for Multi-Label Text Categorization

When a sample belongs to more than one label from a set of available classes, the classification problem (known as multi-label classification) turns to be more complicated. Text data, widely available nowadays in the world wide web, is an obvious instance example of such a task. This paper presents a new method for multi-label text categorization created by modifying the Error-Correcting Output...

متن کامل

Ranking Error-Correcting Output Codes for Class Retrieval

Error-Correcting Output Codes (ECOC) is a general framework for combining binary classification in order to address the multi-class categorization problem. In this paper, we include contextual and semantic information in the decoding process of the ECOC framework, defining an ECOC-rank methodology. Altering the ECOC output values by means of the adjacency of classes based on features and class ...

متن کامل

Loss-Weighted Decoding for Error-Correcting Output Coding

The multi-class classification is a challenging problem for several applications in Computer Vision. Error Correcting Output Codes technique (ECOC) represents a general framework capable to extend any binary classification process to the multi-class case. In this work, we present a novel decoding strategy that takes advantage of the ECOC coding to outperform the up to now existing decoding stra...

متن کامل

N-ary Error Correcting Coding Scheme

The coding matrix design plays a fundamental role in the prediction performance of the error correcting output codes (ECOC)-based multi-class task. In many-class classification problems, e.g., fine-grained categorization, it is difficult to distinguish subtle between-class differences under existing coding schemes due to a limited choices of coding values. In this paper, we investigate whether ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000